A research-led deep dive into
one of America's most-used apps.
This project is a structured usability evaluation of the Walmart mobile app, conducted to understand where a product used by millions every day has room to better serve its users. Using a combination of qualitative interviews, observational methods, and validated usability metrics, I investigated how real users navigate the app's core tasks and where their experience could be smoother. The findings reveal a pattern of small but compounding friction points and a clear roadmap to address them.
When convenience
isn't convenient.
The Walmart app is used by millions across the US for delivery, pickup, and in-store shopping. As a team of international students who started using the app in 2024, we ran into friction almost immediately — switching delivery modes, finding past orders, and navigating a cluttered interface all proved harder than expected for an app serving such a critical daily need.
Our core question
How usable is the Walmart app, really?
We set out to evaluate user interactions across the app's primary tasks, examining usability and satisfaction levels, identifying design challenges, and generating actionable recommendations for improvement. The goal was to understand not just what was frustrating users, but why, and how it could be fixed.
Research Objectives
Six research objectives guided our evaluation, helping us look at the app from multiple angles — from mental models to accessibility to measurable satisfaction scores.
Who uses the Walmart app?
Walmart serves over 240 million customers weekly across its stores and digital platforms, from working families and budget-conscious students to professionals and seniors who rely on it for everyday essentials.
The persona below represents the user type most central to our study: someone navigating the app frequently, with specific goals, and real friction standing in the way.
Profile
User Story
Amy is a busy student juggling academics and other responsibilities. Convenience is her top priority — she's relatively new to the Walmart ecosystem and visits the app about twice a week. While she prefers pickup or in-store shopping, she leans on delivery when her schedule gets hectic. She relies on signals like "Popular Pick" and "Best Seller" to make quick, confident decisions.
Goals
Pain Points
How we gathered the data
We used four complementary methods to build a complete picture of the app's usability: qualitative depth from interviews, pattern recognition from affinity mapping, real-time behavioral data from think-aloud sessions, and a standardized quantitative score from the SUS.
Semi-Structured Interviews
Open-ended conversations that let participants tell us about their real experiences — what they use the app for, what they love, and where they get stuck.
Qualitative · 10 participants- Conducted both in-person and virtually to accommodate all participants
- Questions focused on usage patterns, pain points, and feature familiarity
Affinity Mapping
A collaborative synthesis exercise where we grouped raw interview data into themes to surface recurring patterns and shared frustrations.
Synthesis · Miro- All 10 interviews were coded into individual data points on sticky notes
- Resulted in 6 themes: Usage Patterns, Demographics, Shopping Behavior, Useful Features, UI Challenges, Suggestions
Think-Aloud Protocol
Participants verbalized their thoughts in real-time while completing 3 structured tasks, letting us observe exactly where the interface caused confusion or delay.
Observational · 10 sessions- 3 tasks designed from affinity mapping themes; each timed with pilot-study benchmarks
- Success measured by: time on task, number of clicks, and task completion rate
System Usability Scale (SUS)
A validated 10-question questionnaire administered after the think-aloud sessions, giving our findings a measurable, benchmarkable usability score.
Quantitative · Benchmark: 67- Each participant completed the SUS immediately after finishing all 3 tasks
- Average score calculated across all participants for a group benchmark comparison
The Interviews
We started with semi-structured interviews because they're the most direct way to surface friction — open-ended enough for participants to share what they actually experience, structured enough to keep things focused. Talking to users first grounded everything that came after: the tasks we designed, the patterns we mapped, the hypotheses we tested.
We spoke with 10 participants — a mix of students, working professionals, and housewives across age groups 20–40, both novice and experienced Walmart users. Sessions were held in-person and virtually, each lasting 45–50 minutes.
Our interviews were structured around eight core themes — each one designed to pull out a different layer of the user experience, from first impressions to long-held frustrations.
Age, occupation, and which Walmart platform and device they use
How long they've used Walmart, how frequently, and their typical shopping mode (delivery, pickup, in-store)
How they place an order, how they navigate the app, and what the process feels like
Challenges and frustrations encountered — features that are hard to find or understand
Most useful features and why — including filters and how well they work
Whether they subscribe to Walmart+ and the reasoning behind that decision
If they could change one thing about the app — what would it be and why
Anything the participant felt wasn't covered — their chance to add unprompted thoughts
Affinity Mapping
Once the interviews wrapped up, we had a wealth of raw observations, quotes, and patterns — but they needed structure. Affinity mapping was how we made sense of it all. Every key insight from each session was written out individually and then collaboratively grouped by the team on Miro, letting us see which issues were isolated and which were systemic.
We ended up with six distinct themes that became the backbone of everything that followed.
Fig. Affinity Map
The process was intentionally non-prescriptive at first — we let the data lead. Groupings emerged from repeated themes across participants rather than from assumptions we brought in. The result was a shared visual language of user behavior that the entire team could point to, debate, and build on.
Think Aloud Protocol
Our participants — graduate students with varying levels of experience using the app — completed a retrospective think-aloud session, reflecting on each task after performing it. This method captured both their immediate reactions to the interface and deeper insights on what challenged or surprised them. Based on a pilot study we conducted beforehand, we set time benchmarks for each task. We recorded three performance metrics across every session to evaluate both efficiency and usability.
Task Success Rate
Whether the participant completed the task
Time on Task
How long it took to complete each task
Number of Clicks
How directly participants navigated to the goal
Below are the three tasks we defined to conduct our think-aloud sessions.
Switch between shopping modes — from delivery to pickup
Search for a product and add the most relevant non-sponsored item to the cart
Reorder a previously purchased item from the user's order history
Each session was documented as it happened. What started as raw notes became the structured list of issues below — a direct translation from observation to insight.
Fig. Think-Aloud Session Data
Fig. Participant Issues Identified
Validating Our Sample
The Five Rater Method
To validate that our participant sample was sufficient to surface meaningful issues, we applied the Five Rater Method — a standard formula in usability research for estimating issue detection probability.
With just 5 raters conducting the same study, there is a 65% probability of detecting the same issues — confirming that our 10-participant sample reliably surfaced the core usability problems in the Walmart app.
Fig. Five Raters Chart
The SUS Questionnaire
After completing the think-aloud tasks, participants filled out the 10-question System Usability Scale (SUS) questionnaire — a validated, widely-used instrument for measuring perceived usability. Each statement is rated on a 1–5 scale, alternating between positive and negative framing to reduce bias.
Fig. SUS Response Data
All participants gave explicit permission for their data to be used and shared as part of this research study.
What the data revealed —
and what to do about it.
The SUS score, task completion rates, and think-aloud observations all pointed to the same conclusion: the Walmart app has real, measurable usability gaps. Here's what I found, and four concrete recommendations to address it.
System Usability Scale (SUS)
Below the industry standard of 67 — confirmed with 88% statistical confidence.
To check how reliable our average score was, we ran a statistical test called a T-distribution — a standard approach when working with small samples. It told us that we can be 90% confident the true average score sits between 55.49 and 68.51. More importantly, there is an 88% chance that the actual score is below the industry standard of 67 — which means our finding isn't a fluke. The app's usability problem is real and statistically supported.
At a Glance
Think-Aloud Performance Analysis
Each participant attempted three timed tasks while verbalising their experience. The rings below show how many successfully completed each task, how long it took on average, and how likely it is that the broader user population would pass — giving us both a behavioral and statistical view of where the app holds up and where it doesn't.
Task 1 — Mode Switch
Switch delivery → pickup
Task 2 — Product Search
Find & add non-sponsored item
Task 3 — Reorder
Reorder a past purchase
Key Findings from Interviews & Affinity Mapping
Patterns that surfaced repeatedly across interviews and affinity mapping.
Navigation Challenges
The delivery/pickup mode switcher was nearly invisible to users. 40% couldn't find it at all, and those who did exceeded the benchmark time.
Clarity & Labelling
"My Items" confused 70% of participants who expected "Order History" under their Account — not an ambiguous menu buried in the tab bar.
Accessibility Concerns
International users' Forex cards were flagged as fraudulent — a real, recurring barrier for newcomers who are exactly the app's growing user base.
Below-Average SUS Score
62/100 with a steep learning curve. Even experienced users reported the interface felt challenging and unintuitive for basic tasks.
Feature Bright Spots
The in-store locator and barcode scanner earned genuine praise. These features worked intuitively and users sought them out proactively.
Personalization Gap
Users wanted smarter product recommendations and clearer feedback on delivery status and Walmart+ payment confirmation.
Design Recommendations
Four targeted fixes that would meaningfully improve usability based on what we observed.
Increase Visibility for Switching Shopping Modes
Add larger buttons or distinctive icons for delivery/pickup at the top of the homepage and cart screen. Use color-coding and in-app tooltips for novice users. A persistent banner at checkout highlighting available modes and a clear call-to-action would significantly reduce mode-switching failures.
40% of users could not find the option to switch shopping modesStreamline the Reordering Process
Rename "My Items" → "Order History" for immediate clarity. Shift predictive reordering — which currently sits at the very bottom of a long homepage scroll — to a prominent, above-the-fold position based on purchase history patterns.
70% of users struggled to locate reorder options under "My Items"Reduce Sponsored Product Interference
Introduce a filter option to hide sponsored products from search results. Decrease their frequency to better balance advertising revenue goals with a clear, trustworthy shopping experience — especially for users actively trying to avoid them.
30% of users were disrupted by sponsored items dominating search resultsImprove Performance Consistency Across Devices
Optimize Android-specific performance by reducing load times and ensuring a consistent frame rate. Conduct regular parity testing on both iOS and Android to eliminate the responsiveness gap that Android users experienced compared to iPhone users.
30% of Android users noticed interaction delays compared to iPhoneWhat this study taught me.
Running a rigorous evaluation of a real, widely-used app taught me more than any classroom exercise could. Here's what I took away, about research, about design, and about the responsibility that comes with building products people depend on.
"Navigation, labelling, and engagement — seemingly simple things — have an enormous impact on whether users can actually accomplish what they came to do."
— Nielsen, J. (1994). Usability Engineering. Morgan Kaufmann.
The Walmart app is genuinely essential for millions of people. Its affordable prices and wide product range make it a daily necessity — not a luxury. That reliance makes getting the UX right not a design nicety, but a real responsibility. My research proved there's a clear, fixable gap between what users need and what the app currently delivers.
Users rationalize confusion
Several participants blamed themselves when they couldn't find a feature. The think-aloud protocol made it clear the interface, not the user, was the problem.
Designing for the international user
Payment failures and unfamiliar UI patterns hit international users the hardest. Inclusive design isn't just accessibility — it's accounting for users new to a country's systems entirely.
Mixed methods catch what one method misses
Interviews surfaced frustrations users could articulate. Think-aloud revealed behaviors they couldn't. The SUS gave us a number to stand behind. Each method made the others stronger.
Labels carry enormous weight
A single word — "My Items" vs "Order History" — was the difference between task success and failure for 70% of our participants. Copy is design.
Research is a team sport
The affinity mapping session was one of the most valuable parts of the process. Working through the data collaboratively with the team produced insights none of us would have found alone. The best patterns emerged from disagreement, not consensus.
TL;DR?
The Numbers.
As a new international student in 2024, I quickly learned that Walmart was the go-to for affordable shopping. But the app? Not so sure. So I ran a study with my team to figure out exactly where it was failing its millions of users.
shopping mode
reorder items
industry standard = 67
The Story.
The essentials: what went wrong, how we found out, what we discovered, and what comes next.
The Problem
😔An app used by millions felt clunky and unintuitive
Basic tasks like switching shopping modes or reordering items turned into frustrating hunts — especially for users new to the platform.
What we did
🗣️We talked to users and watched them work in real-time
10 interviews, affinity mapping, think-aloud sessions, and a System Usability Scale survey. Industry-standard methods applied throughout.
What we found
👥Users struggled with navigation and unclear labels
Users consistently missed key features. The app scored 62/100, way below the industry standard of 67. T-distribution confirmed the result statistically.
What's next?
📱Small, targeted changes could transform the experience
Better mode switching visibility, clearer labels, fewer sponsored interruptions, and Android performance improvements. The fixes are specific and actionable.
Still curious? Every number, quote, and pattern above has a full story behind it — scroll back to the top ↑ and read the whole thing.